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ABSTRACT 

Function of non-B DNA structures are poorly 
understood though several bioinformatics studies 
predict role of the G-quadruplex DNA structure 
in transcription. Earlier, using transcriptome 
profiling we found evidence of widespread 
G-quadruplex-mediated gene regulation. Herein, 
we asked whether potential G-quadruplex (PG4) 
motifs associate with transcription factors (TF). 
This was analyzed using 220 position weight 
matrices [designated as transcription factor 
binding sites (TFBS)], representing 187 unique TF, 
in > 75 000 genes in human, chimpanzee, mouse 
and rat. Results show binding sites of nine TFs, 
including that of AP-2, SP1, MAZ and VDR, 
occurred significantly within 100 bases of the PG4 
motif (P< 1.24E-10). PG4-TFBS combinations were 
conserved in 'orthologously' related promoters 
across all four organisms and were associated 
with >850 genes in each genome. Remarkably, 
seven of the nine TFs were zinc-finger binding 
proteins indicating a novel characteristic of PG4 
motifs. To test these findings, transcriptome 
profiles from human cell lines treated with 
G-quadruplex-specific molecules were used; 66 
genes were significantly differentially expressed 
across both cell-types, which also harbored 
conserved PG4 motifs along with one/more of the 
nine TFBS. In addition, genes regulated by PG4- 
TFBS combinations were found to be co-regulated 
in human tissues, further emphasizing the regula- 
tory significance of the associations. 



INTRODUCTION 

The regulation of gene expression in eukaryotes is highly 
complex and often occurs through the coordinated action 
of multiple transcription factors (TF). A simplistic model 
posits specific DNA sequence motifs or c«-regulatory 
elements dictate binding of TF leading to activation or 
repression of genes. Emerging evidence suggests the pos- 
sibility that a subset of such c/.v-regulatory elements may 
adopt distinct conformation(s) that additionally specify 
TF-DNA interactions. In this context, it is interesting to 
consider the DNA secondary structures adopted by 
guanine-rich sequences called G-quadruplexes (or G4 
DNA) — a unique self-arrangement of Hoogsten 
base-paired, intramolecular or intermolecular, association 
of DNA strands in parallel/antiparallel orientation 
stabilized by charge coordination with monovalent 
cations (especially K + ) (1^1). 

A large volume of evidence from genome-wide compu- 
tational studies suggest prevalence of potential G4 (PG4) 
motifs in promoters of a wide range of species. Initially 
observed in a genome-wide study comprising 18 bacterial 
species where PG4 motifs were found to be enriched 
within regulatory regions (5); this was also found to be 
the case when > 140 bacteria were tested (6). Further 
studies showed enrichment of PG4 motifs in promoters 
of human (7,8), chimpanzee (8), mouse (8), rat (8) and 
chicken (9) genomes; moreover, occurrence of numerous 
human promoter PG4 motifs were found to be conserved 
within corresponding mouse and rat promoters (8). In 
addition, emerging evidence also suggests role of 
G-quadruplexes in chromatin packaging (10-12), recom- 
bination (13) and CpG methylation (14). 

In vitro evidence for functional role of the G-quadruplex 
structure in transcription has been shown for few genes. 
c-MYC was the first case, where a G-quadruplex-forming 
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sequence in the nuclease hypersensitive element upstream 
of the PI promoter was shown to affect c-MYC transcrip- 
tion (15). Similarly, transcription was influenced by 
G-quadruplex-forming sequence motifs within the core 
promoter of human c-KIT (16) and k-RAS oncogenes 
(17). Promoter-G-quadruplex were also reported for a 
number of other genes such as VEGF, PDGF, HIFla, 
BCL-2, RB and RET (18) in addition to thymidine 
kinase 1, where a non-canonical G-quadruplex motif 
formed from repeats constituting two guanines instead 
of three was found to be functionally active (19). In line 
with these studies, transcriptome profiling performed in 
human cancer cells indicated changes in gene expression 
in presence of established intracellular G-quadruplex 
binding ligands suggesting a genome-wide role of 
G-quadruplex motifs in transcription (20). 

Encouraged by these findings, attempts were made to 
probe involvement of potential trans factors in recognition 
of G-quadruplex motifs. Using chromatin immunopre- 
cipitation (ChIP) assays we recently demonstrated that 
the non-metastatic factor NM23-H2 binds to the c-MYC 
promoter via a G-quadruplex element (21). In line with 
this, interactions of recombinant hnRNP Al/Upl with the 
KRAS promoter G-quadruplex (22), Myc-associated 
zinc-finger protein (MAZ)/poly(ADP-ribose) polymerase 
1 (PARP-1) binding to the G-quadruplex element in the 
murine KRAS promoter (23) and binding of nucleolin/ 
hnRNP proteins to the G-quadruplex forming sequences 
of the VEGF promoter was shown (24). Moreover, 
G-quadruplex motifs in the promoter of three 
muscle-specific genes, human sarcomeric mitochondrial 
creatine kinase, muscle creatine kinase and integrin a-7 
of mouse were shown to bind the homodimeric form of 
the TF MyoD in vitro (25). Although these studies suggest 
G-quadruplex-TF interactions as possible regulatory 
mechanisms, focus on individual promoters and TF has 
not tested the fuller scope of such structure specific 
interactions. 

We hypothesized that functionally active quadruplex 
motifs must associate with one or more TF and 
reasoned that given the large number of PG4 motifs 
found near transcription start sites (TSS) the ones that 
are most likely to be functional, as a first approximation, 
would be conserved across species. With this in mind using 
the strategy shown in Figure 1 we sought to find out 
PG4-transcription factor binding site (TFBS) associations 
in a genome-wide context in human, chimpanzee, mouse 
and rat. Findings were tested using genome-wide tran- 
scriptome profiling data generated in two cell lines after 
treatment with a molecule that binds quadruplex motifs 
inside cells. Further validation was obtained from 
tissue-specific expression of genes harboring PG4-TFBS 
combinations. 



MATERIALS AND METHODS 

Sequence retrieval and analysis 

The ±2-kb region centered at annotated TSS of 20 664 
human, 20601 chimpanzee, 19 656 mouse and 15162 rat 
non-redundant promoter sequences were retrieved from 



UCSC build hgl8 for human, PanTro2 for chimpanzee, 
mm9 for mouse and rn4 for rat. PG4 motif forming se- 
quences with stem size three were searched within these 
promoters with a customized algorithm as described 
earlier (5). Briefly, we adopted a general pattern G„-N L1 - 
G„-N L 2-G„-N L 3-G„, where G is guanine; N is any nucleo- 
tide including G; n = 3-5, maintaining a constant n within 
a single motif while the number of nucleotide with loops 
(LI, L2 and L3) could vary from 1 to 7. The program was 
rerun with cytosine (C) instead of guanine (G) to identify 
motifs on the complimentary strand and appropriately 
corrected for strand orientation. We restricted our 
program to a stem size of 3 and loop length of 1-7 con- 
sidering that most in vitro characterizations and experi- 
ments have used these guidelines for PG4 motifs, though 
recent work shows that non-canonical motifs are also 
possible with varying loop and stem sizes (5,19,26). 

Analysis of TFBS 

Analysis of conservation of PG4 motifs in orthologous 
promoters of human, chimpanzee, mouse and rat were 
carried out using algorithms previously published by us 
(8). Herein, we extended our previous study to include 
chimpanzee and used NCBI HomoloGene for ortholog 
information. Using human genes having at least one 
PG4 motif within ± 2 kb of TSS, we searched for the cor- 
responding promoter region in chimpanzee, mouse and rat 
to retrieve 13 437 human-chimpanzee, 14 940 human- 
mouse and 13 764 human-rat promoter pairs. For each 
promoter pair, PG4 motifis) was searched within 200 
bases with respect to the human PG4 motif position 
in the corresponding chimpanzee, mouse and rat 
promoter (Figure 2). These promoter-pairs were con- 
sidered for further analysis and designated as PG4 CP . H 
(PG4 conserved promoter set human), PG4 CP . C (PG4 
conserved promoter set chimpanzee), PG4 CP . M (PG4 
conserved promoter set mouse), and PG4 CP . R 
(PG4 conserved promoter set rat). 

We considered 220 PWMs, which represented 187 
unique TFs as potential TFBS. PG4 CP . H , PG4 CP . C , 
PG4 CP _ M and PG4 CP . R were analyzed for presence of 
these TFBS using MATCH™ (TRANSFAC® profession- 
al 12.1) (27). In order to analyze the enrichment of TFBS 
elements on conserved-set promoters we considered the 
rest of the promoters (i.e. excluding the conserved set) as 
a control set. The total occurrence of any given TFBS on 
each conserved-set promoter was considered as the 
observed frequency. Similarly, the occurrence of a TFBS 
in control set promoters, gave the randomly expected fre- 
quency. The discrepancy between observed and expected 
frequency was evaluated by determining the statistically 
variable chi-square (x 2 ), independently for human, chim- 
panzee, mouse and rat. 

PG4-TFBS inter-distance analysis 

Using the positions of conserved PG4 motif and TFBS 
that were found to be significantly enriched on PG4 CP . H , 
PG4 CP . C? PG4 CP . M and PG4 CP . R sets, promoter wise n*n 
combinations of PG4-TFBS were generated and their re- 
spective inter-distance (distance between conserved PG4 



Nucleic Acids Research, 2011, Vol. 39, No. 18 8007 



20664 
Human Promoters 



20601 

Chimpanzee Promoters 



19656 
Mouse Promoters 



15162 
Rat Promoters 



V 



Promoters of 'orthologous' genes in all four species 



871 'orthologous' promoters with conserved PG4 motif 



Human 
(1563) 



Chimpanzee 
(1666) 



Mouse 
(1459) 



Rat 
(1350) 



Identification of TFs with enriched TFBS in 'orthologous' promoters 
having conserved PG4 motif 



45 common enriched TFs in 
Human, Chimpanzee, Mouse and Rat 



I 



TFBS within 100 bases of PG4 motif 



1 



Experimental Validation: Gene 
expression analysis in two cell-lines 
after treatment with 
G-quadruplex-binding ligand 



Experimental Validation: Meta-analysis 
of gene expression in 68-normal human 
tissues 



Figure 1. Flowchart summarizing the approach adopted in this study. Schema of the strategy followed to test genome wide association of TF with 
PG4 motif(s). 



motif and TFBS) were calculated. The inter-distance 
values were then grouped in bins of 100 and their 
respective percentage frequency within each bin was 
calculated. 



Analysis of PG4 motif co-occurrence with TFBS elements 

To analyze the co-occurrence significance of TFBS with 
PG4 motif individually on PG4 CP . H , PG4 CP .c, PG4 C p-m 
and PG4 CP . R sets, we first evaluated the randomly 
expected co-occurrence frequency of individual TFBS 
with PG4 motif. The actual promoter-wise co-occurrence 
of individual TFBS element with PG4 motif was then 
compared with random expectation of co-occurrence fre- 
quency to analyze significance. This is based on a previ- 
ously published method (28). Briefly, F (fl, f'2) the 
frequency of co-occurrence of individual TFBS with 



PG4 motif within m-base pairs (window size) in any 
n-base pair long sequence is given by 

F(/1 f2) = F(fl)F(f2)((2n-m)(m+l)-n) 
' n* n 

Where F (fl) is the promoter-wise expected frequency of 
PG4 motif, F (fl) is the promoter-wise expected frequency 
of individual TFBS; m is 200 bases and n is 4000 bases (in 
our case). 

The actual co-occurrence frequency of PG4 motif and 
individual TFBS site within 200 bases in PG4 CP . H , 
PG4 CP . Ci PG4 CP _ M and PG4 CP . R set sequences were 
obtained by querying the promoter-wise TFBS position 
and PG4 motif conservation files using in house Perl 
scripts. In order to calculate the statistical significance of 
co-occurrence, x 2_ test was performed for individual 
TFBSs. For example, given 'n—V degrees of freedom 
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Figure 2. PG4 motif positional conservation across orthologously 
related promoters. Scheme showing identification of conserved PG4 
motif within ±200 bases in orthologously related promoters (±2kb 
of TSS) of human, chimpanzee, mouse and rat and search 
for associated TFBS. H represents a human gene and C, M and 
R represent their orthologous in chimpanzee, mouse and rat, 
respectively. 



(e.g. n— 1 = 699 for SP1 in human), to exclude false 
positives with a simple Bonferroni correction, a rea- 
sonable significance level would be P = 0.005/ 
699 = 7.15 x 10E-06, which corresponds to / 2 = 873.26. 

Analysis of PG4-TFBS enrichment on differentially 
expressed genes 

Genome-wide expression data for HeLa S3 and A549 cells 
after treatment with TMPyP4, previously published from 
our laboratory (20), were used for this analysis. The 1161 
differentially expressed genes (863 up and 298 down at 
<20% FDR) were compared with human conserved-set 
genes (PG4 CP _ H )- The ±2-kb sequence (centered at TSS) 
of genes found to be common with TMPyP4-treated dif- 
ferentially expressed genes were analyzed for enrichment 
of nine TF (earlier shown to be enriched within 100 bases 
of conserved PG4 motif). The total occurrence of individ- 
ual TFBS on each of these genes were considered as 
observed frequency. The expected frequencies for these 
nine TFBS were found as described earlier for TFBS en- 
richment analysis. x 2 -test was performed for individual 
TF to get statistical significance. 



Comparison with experimentally determined ChlP-seq/ 
ChlP-on-chip TFBS 

The in-silico binding positions predicted using TRANSFC 
for human conserved-set promoter were compared with 
experimentally determined and publically available 
ChlP-on-Chip (ChIP followed by microarrays) data for 
SP1 (29), NF-Y (29) and ChlP-seq (CMP followed by 
parallel sequencing) data for STAT1 (30), at the time of 
this study. The sequences of ChlP-on-chip and ChlP-seq 
binding coordinate intervals for common promoters were 
fetched from UCSC (hgl8) and searched for the SP1 and 
NF-Y binding positions using TRANSFAC consensus 
motifs (PWM). For STAT1 we used binding consensus 
motif proposed by authors (30). The observed TFBS pos- 
itions were mapped with respect to TSS and PG4-TFBS 
inter-distance values were calculated, which were 
finally grouped in bins of 100 to calculate respective 
frequencies. To calculate the level of similarity for SP1, 
NF-Y and STAT1 between TRANSFAC-predicted and 
ChlP-on-chip/ChlP-seq binding sites and their respective 
inter-distance from conserved PG4 motifs we first plotted 
the frequency distribution of the TFBSs with respect to 
PG4 motif (5' base) position (Supplementary Figure SI). 
We also calculated the correlation coefficient of the two 
data sets for statistical significance. 

Co-expression and significance analysis 

Tissue-specificity of genes harboring PG4 motifs and a 
TFBS within a 100 base window on the conserved set 
promoters was checked in 68 human tissues (31). 
Analysis was largely based on a previously described 
method (21). The expression data of each gene across all 
tissues was first normalized to be mean 0 and variance 1 
before ranking them as per their normalized expression 
level in each tissue, hence generating 68 tissue-specific 
ranked gene lists. We generated two distinct sets of 
genes namely set A and set B for TFBS that significantly 
co-occurred with PG4. Set A corresponds to genes with 
TFBS within 100 bases of conserved PG4 motif. Set A 
includes 785, 320, 456, 420, 372, 369, 384, 296 and 253 
genes for Kid3, KROX, AP-2, SP1, ETF, MAZ, VDR, 
ZF5 and WT1, respectively. Set B corresponds to genes 
where respective TFBSs were found beyond ± 100 bases 
of the conserved PG4 motif. Set B includes 86, 228, 332, 
280, 324, 344, 385, 296 and 262 genes for Kid3, KROX, 
AP-2, SP1, ETF, MAZ, VDR, ZF5 and WT1, respectively. 

Enrichment of expression of a given gene set S in a 
particular tissue and its significance was analyzed from 
the whole ranked list of genes T for the tissue after 
evaluating the non randomness of ranks of S within T, 
using the Mann-Whitney rank sum statics. After summing 
the ranks of S in list T, we tested the significance of this 
rank sum against the rank sum of control set (10 random 
sets of same cardinality from all genes in J, excluding S). 
If u and a2 are the mean and variance of the control set, 
then enrichment (z-score) of S is given by (u — S)/ a2, 
which measures enrichment in terms of number of 
standard deviations away from the mean of the control 
sets. A z-score of >4.0 was considered to be significant in 
the present study. 



Nucleic Acids Research, 2011, Vol. 39, No. 18 8009 



RESULTS 

More than 40 TFBS-PG4 motif associations are 
conserved across human, chimpanzee, mouse 
and rat promoters genome-wide 

We found 5005 human-chimpanzee, 4929 human-mouse 
and 2263 human-rat promoter pairs with PG4 motifs. Out 
of these 871 promoters harbored at least one conserved 
PG4 motif (that is present in all the four organisms) and in 
total 1563, 1666, 1459 and 1350 PG4 motifs in human, 
chimpanzee, mouse and rat, respectively (Table 1). We 
reasoned that the 871 promoters harboring one/more 
conserved PG4 motifs had the maximum likelihood of 
being functionally relevant in the context of PG4 motif- 
mediated transcription. KEGG pathway analysis was 
performed using web-based tool GeneCodis (32) to 
check for potential importance of genes harboring 
conserved PG4 motif(s). Significant over-representation 
(_P<6.8E-05; after correction for multiple hypothesis 
testing) was found in MAPK signaling, regulation of 
actin cytoskeleton, focal adhesion, TGF-p signaling, 
Wnt signaling and apoptosis (Supplementary Table SI). 

Next, we asked which TFBS were predominant within 
the promoters harboring conserved PG4 motifs. In order 
to statistically analyze the TFBS enrichment we con- 
sidered 871 PG4 cp _h, PG4 CP -c, PG4 CP . M and PG4 CP . R 
along with control sets of 19 793 human, 19 730 chimpan- 
zee, 18 785 mouse and 14292 rat promoter sequences, 
where the control sequences were devoid of any conserved 
PG4 motif (Figure 3). This revealed 120622, 112 351, 
112 855 and 108 669 binding sites for 184, 184, 180 and 
181 different TFs on the PG4 CP -h, PG4 cp . c , PG4 cp . m and 
PG4 cp .r sets, respectively. Considering a significance level 
of P< 0.005, we obtained target sites for 63 TFs in 
human, 60 TFs in chimpanzee, 63 TFs in mouse and 60 
TFs in rat. Out of these 45 TFs were found to be common 
to all four species (Supplementary Table S2) indicating 
that many TF target sites were significantly enriched in 
association with PG4 motifs. 

Target sites of seven zinc-finger TF significantly co-occur 
with PG4 motifs 

Next we checked whether association of PG4 motifs with 
TFBS had any particular distribution with respect to their 
relative positioning within a promoter. Inter-distance 
between all conserved PG4 motifs and TFBS of each of 
the 45 TFs was mapped within the 871 conserved-set pro- 
moters independently for human, chimpanzee, mouse and 

Table 1. Distribution of PG4 motifs near TSS 



rat and represented as percentage frequency (fraction of 
all associations per TFBS) for each PG4-TFBS combin- 
ation in a window of 100 bases (Figure 4). Interestingly, 
we noted that for any particular PG4-TFBS combination, 
the inter-distance distribution was largely distinct, and 
moreover, the respective distributions were very similar 
in all the four species. Interestingly, many TFBS either 
overlapped or were within ± 100 bases of the conserved 
PG4 motif. Considering the potentially important impli- 
cation of this, we analyzed statistical significance of the 
co-occurrence for PG4-TFBS pairs which were within an 
inter-distance of 100 bases using a previously published 
method (28). This gave target sites of 21, 16, 12 and 11 
TF-PG4 combinations, in human, chimpanzee, mouse 
and rat, respectively. Of these, TFBS for nine factors 
were found to be common within all the four species 
(Table 2). We noted with interest that seven out of the 
nine factors [SP1, MAZ, WT1, KROX (EGR-2), Kid3 
(ZNF354C), ZF5 (ZFP161) and VDR] whose target sites 
were found within 100 bases of the PG4 motif had the 
zinc-finger motif, particularly the cysteine2-histidine2 
(C2H2) domain (Table 3). This was also true for many 
of the TFs that co-occur within PG4-harboring promoters 
(Supplementary Table S3). Consistent with this finding 
one earlier study found that a large number of upstream 
PG4 motifs are enriched with target sites of SP1 (33). 
Zinc-finger factors, particularly the Cys2-His2 type repre- 
sent a significant number among all TF. Though, keeping 
this in mind, rigorous methods for statistical corrections 
were devised (Figure 3 and see 'Materials and Methods' 
section), we further pondered on the likelihood of associ- 
ations that could be artifacts merely because of high 
numbers. Out of 187 unique TFs studied here, 33 (0.18) 
were Cys2-His2 zinc-fingers. We found six Cys2-His2 type 
zinc fingers out of nine to be associated with PG4 motifs, 
constituting a fraction of 0.66 (P < 0.001; two-tail fisher 
exact test) suggesting an enrichment that is more than 
expected by chance. On the other hand, by a similar 
analogy other TFs with high numbers would be expected 
to have more association with PG4 motifs. This was not 
the case; 21 out 187 TFs were leucine zipper factors, 
however none of these were found to be associated with 
PG4 motifs in our analysis. 

Experimentally determined gene expression reveals 
role of PG4-zinc-finger associations 

To test the physiological significance of the above findings, 
we resorted to gene expression analysis of human cells 



ORFs studied Total no. of PG4 Promoters with at Conserved PG4 

motif in promoters 3 least one PG4 motif motifs in 871 orthologously b 

related promoters 



Human 20664 50939 14 836 1563 

Chimpanzee 20 601 41 811 14184 1666 

Mouse 19656 33738 13738 1459 

Rat 15 163 20148 9470 1350 



a ±2kb centered at TSS. 

b Human, chimpanzee, mouse and rat. 
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Figure 3. Strategy followed for genome wide comparative analysis to identify enriched presence of TFBS within promoters harboring conserved PG4 
motifs in human, chimpanzee, mouse and rat. 



treated with the cationic porphyrin ligand (TMPyP4) that 
binds selectively to G-quadruplex motifs inside cells (15). 
In a previous study we demonstrated that the effect of 
TMPyP4 and other ligands that selectively bind to 
G-quadruplexes show similar genome-wide expression 
changes largely consistent with presence of the quadruplex 
motif in promoters (20), though there could be other sec- 
ondary mechanisms that influence the transcriptome. The 
gene expression datasets in lung adenocarcinoma (A549) 
and cervical carcinoma (HeLaS3) cells were analyzed to 



determine whether genes having PG4-TFBS associations 
show significant change in expression. 

We found 66 genes harboring conserved PG4 motifs 
within ±2kb of TSS that gave significant differential 
expression (FDR cutoff <20%) consistently across four 
replicates in both cell lines, after treatment with 
TMPyP4 (Figure 5). Next we asked if target sites for 
any of the nine TFs, including the seven zinc-finger 
factors, were present along with the conserved PG4 
motifs in the 66 genes. Interestingly, we found significantly 
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Figure 4. Zinc finger TFBS are closely associated with PG4 motifs. Distribution of TFBS with respect to PG4 motif (PG4-TFBS inter-distance) in 
promoters of human, chimpanzee, mouse and rat that harbor conserved PG4 motifs. Pseudocolor represents percentage frequency of PG4-TFBS 
inter-distance values in bins of 100 bases relative to nearest PG4 motif. Asterisk represents additional zinc finger TF found in the study. 



enriched occurrence in each of the 66 genes relative to the 
randomly expected chance of occurrence of each target 
site (P < 1.02.E-05); number of differentially expressed 
genes that harbor significant PG4-TFBS combinations 



are given in Table 4. Figure 5 shows the expression 
arrays representing all the 66 differentially expressed 
genes after replicate treatments with TMPyP4 along with 
the corresponding promoters where the relative positions 
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of the TFBS and conserved PG4 motif are shown. 
Furthermore, we noted that in each of the 66 genes, 
target sites of one or more of the nine TFs were present 
within ±100 bases of the conserved PG4 motif. Together 
this suggests wide spread functional role of PG4 motifs in 
gene expression, however, it may be noted that TMPyP4 
selectivity towards G-quadruplex DNA vis-a-vis duplex 
DNA is modest. Therefore, though the gene expression 
results reported earlier (20) were additionally validated 
using more selective G-quadrulex binding ligands like 
the carbazole derivative, BMVC (20), and also a second 
cationic porphyrin (TpPy) (20), these findings will require 
to be tested further for individual genes. 

PG4-TFBS combinations from in vivo genome-wide 
ChlP-seq and ChlP-on-chip data 

The PG4-TFBS associations were found by us using 
TRANSFAC motifs (PWMs) that are built based on 
both functional and predicted target sites and constitute 



Table 2. Significance of PG4-TFBS co-occurrence within ±100 bp in 
promoters with conserved PG4 motifs in human, chimpanzee, mouse 
and rat 







Human 


Chimpanzee 


Mouse 


Rat 




TF name 


P-value 


P-value 


P-value 


P-value 


1 


SP1 


<E-300 a 


2.43E-143 


1.20E-76 


1.36E-41 


2 


WT1 


<E-300 a 


5.31E-41 


7.70E-11 


1.24E-10 


3 


KROX 


<E-300 a 


2.20E-31 


1.35E-48 


1.56E-20 


4 


MAZ 


<E-300 a 


1.59E-105 


3.56E-57 


2.35E-18 


5 


VDR 


1.24E-262 


1.40E-269 


1.46E-19 


1.44E-11 


6 


Kid3 


<E-300 a 


<E-300 a 


<E-300 a 


<E-300 a 


7 


ZF5 


<E-300 


<E-300 a 


1.48E-190 


8.27E-139 


8 


ETF 


<E-300 U 


6.33E-268 


1.03E-123 


4.06E-81 


9 


AP-2 


<E-300 a 


<E-300 a 


4.04E-69 


2.84E-74 



"Indicates value <E-310. 



all possible genomic occurrences. Therefore, we attempted 
to validate our predictions using genome-wide experimen- 
tally determined TFBS for three TF SP1, NF-Y and 
STAT1 out of 45 TFBS enriched within conserved-set pro- 
moters. In case of SP1 we found 402 combinations on 63 
promoters where a conserved PG4 motif and the experi- 
mentally determined SP1 site occurred within ±2kb of 
TSS. Interestingly, as shown in Supplementary Figure 
SI a, the frequency of SP1 distribution with respect to 
the PG4 motif in the case of ChlP-chip was very similar 
to the ones observed with TFBS determined from 
TRANSFAC (6520 PG4-SP1 combinations on 700 pro- 
moters; r = 0.93; P< 0.0001), though functional SP1 
sites were only a fraction of those found in 
TRANSFAC. A similar analysis using the TF NF-Y 
gave 113 conserved PG4-NF-Y combinations in 30 
ChlP-chip identified promoters. Distribution of the occur- 
rence of NF-Y sites (with respect to PG4 motifs) when 
compared to 479 PG4-NF-Y combinations on 118 pro- 
moters from TRANSFAC analysis indicated significant 
similarity in the two distributions (r = 0.76; P< 0.0001; 
Supplementary Figure Sib). In case of STAT1 we 
observed 325 PG4-STAT1 combinations on 82 promoters 
reported by ChlP-seq experiments. Comparing with 987 
PG4-STAT1 combinations on 210 promoters by 
TRANSFAC once again gave a frequency distribution 
which was very similar (r = 0.86; P< 0.0001; 
Supplementary Figure Sic). These results further 
indicated that the PG4-TFBS co-occurrences observed 
using TRANSFAC data are likely to be true in functional 
cases, though the functional set in most cases is expected 
to be limited for a variety of reasons, including chromatin 
compaction (that limits presentation of all available 
TFBS) and co-factor requirements for TF binding 
(which is expected to be context dependent). 



Table 3. Functional annotation of TFBS significantly co-occurring with conserved PG4 motifs (within 100 bases) 



TF Name 


Classification of TF 


Involvement in Biological processes/ 
Pathways" 


Key regulated genes by TF 


SP1 


Zinc-coordinating DNA binding domains, 


Cell cycle; MAPK signaling; TGF-P 


CDK1, CDK2, CDK4, CCND2, IL-10, 




C2H2 zinc-finger domain, Ubiquitous factors 


signaling 


c-MET 


VDR 


Zinc-coordinating DNA binding domains, Cys4 
zinc finger of nuclear receptor type, Thyroid 
hormone receptor-like factors 


Cell-cycle progression, proliferation and 
growth, Osteoblastic differentiation 


TCTP, p73. BRCA1, 


KROX 


Zinc-coordinating DNA binding domains. 
C2H2 zinc-finger domain, cell-cycle 
regulators 


Cell cycle, apoptosis 


BNIP3L, BAK, EFNA1, SFN, 


WT1 


Zinc-coordinating DNA binding domains. 
C2H2 zinc-finger domain, cell-cycle 
regulators, GLI-like 


Cell cycle, MAPK signaling, apoptosis 


CCNAl, p21, BCL-2 


MAZ 


Zinc-coordinating DNA binding domains. 


Cell cycle, apoptosis, lymphocyte 


c-MYC. PPARgammal. BCL-2. RAG-2, 




C2H2 zinc-finger domain 


development, neural differentiation 


DCC 


Kid3 


Zinc-coordinating DNA binding domains, 
C2H2 zinc-finger domain, Krueppel-like 


Kidney and brain development 


HP la, MODI. MOD2 


ZF5 


Zinc-coordinating DNA binding domains, 
C2H2 zinc-finger domain, Krueppel-like 


Cell cycle, cell proliferation, induction 
of programmed cell death 


c-MYC. TK1 


AP-2 


Basic Domains, bHSH 


Cell cycle TGF-P signaling, MAPK 
signaling 


ESDN, EREG, CXCL2, CDKN1A, 
COX-2 


ETF 


Helix-turn-helix, TEA domain 


Cell cycle 


P53 



"Relevant references showing involvement of particular TF in biological processes/pathways are given in Supplementary Table S2. 
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Table 4. TFBS enriched on promoters harboring conserved PG4 
motif that are differentially expressed in A549 and HeLaS3 cells 
after treatment with G-quadruplex binding ligand 



TrBS associated 
with conserved 
PG4 motif 
(within 100 bases) 


Differentially expressed 
genes with PG4-1EBS 
in promoters 


P-value ol PG4-TEBS 
enrichment in 
promoters 


AP2 


34 


2.05E-40 


ETF 


28 


4.61E-53 


Kid3 


58 


1.64E-09 


KROX 


19 


3.29E-13 


MAZ 


24 


5.55E-13 


SP1 


27 


1.88E-25 


VDR 


IS 


1.02E-05 


WT1 


20 


5.31E-11 


ZF5 


41 


2.68E-44 



Genes having PG4-zinc-flnger associations are 
co-expressed 

In order to further test the regulatory significance of the 
associations, we analyzed the transcriptome profile of 
normal human tissues for genes with TFBS-PG4 pairs 
in promoters. This was based on the reasoning that regu- 
latory control by any TFBS in association with the PG4 
motif for a group of genes is likely to result in significantly 
enriched (or altered) expression response (either up or 
downregulation) within specific tissues relative to other 
randomly picked genes. Two groups of genes were 
analyzed for each of the nine PG4-TFBS associations 
found above; genes harboring PG4-TFBS associations 
either within ± 100 bases (set A) or beyond ± 100 bases 
(set B) of conserved PG4 motif (see 'Materials and 
Methods' section). Using gene expression data from 68 
normal human tissues we observed significantly enriched 
expression-response (z-score >4.0; see 'Materials and 
Methods' section for details of statistical analysis) for set 
A in all cases in most tissues (Supplementary Figure S2). 
This was also true in many cases for the genes in set B. 
Interestingly, the TF Kid3 (ZNF354C), KROX (EGR2), 
SP1 and AP-2 showed largely distinct expression in set A 
relative to set B, indicating the likelihood that close prox- 
imity of the TFBS with PG4 may be functionally relevant. 
On the other hand, in case of MAZ, ETF, ZF5 (ZFP161), 
WT1 and VDR z-scores appeared similar in set A and set 
B underscoring the possibility that occurrence of the PG4 
along with the TFBS within the promoter was important 
for gene expression in addition to proximal positioning of 
PG4 TFBS. 



DISCUSSION 

We found target sites of 45 TF out of 187 analyzed are 
enriched in promoters harboring PG4 motifs. The func- 
tional importance of this is implied by the fact that 
binding sites of all the 45 TF and PG4 motif occurrences 
were maintained across four organisms in orthologously 
related promoters. Remarkably, target sites of nine TFs, 
including seven zinc-finger factors, were found to be pre- 
dominantly occurring within 100 bases of a PG4 motif; 



again, we noted, this was found across the four vertebrate 
lineages. These observations were confirmed by analyzing 
transcriptome data generated using a ligand that binds to 
G-quadruplex motifs inside cells. More than 60 genes, 
which significantly changed expression on ligand treat- 
ment in two cell lines of different origin harbored closely 
associated PG4 motif and zinc-finger target sites. Finally, 
genes with PG4-TFBS associations in promoters showed 
significant co-regulation in transcriptome profiles of 68 
human tissues, implicating functional relevance of the 
G-quadruplex-TFBS associations. Taken together, these 
findings give strong indications of a genome-wide regula- 
tory role for PG4-TFBS associations and suggest the im- 
portance of close proximity in specific cases, implicating 
a broader role of transcriptional regulation by 
G-quadruplex elements. 

Recently Cogoi et al. (23) reported interaction between 
the Myc associated zinc-finger protein MAZ and the 
G-quadruplex motif present in the promoter of murine 
KRAS resulting in activation of KRAS expression. Using 
pull down and ChIP assays, they demonstrated in vivo 
binding of MAZ to the quadruplex-forming element in 
the murine KRAS promoter (23). In this context, the 
approach by Isalan et al. (34) who used a phage 
display-based technique to search for protein factors 
that could bind to the telomeric quadruplex motif is 
notable, which identified an engineered Cys2-His2 
zinc-finger protein that was both sequence and 
structure- specific for the telomeric quadruplex motif 
(35). These independent studies involving particular 
cases of quadruplex-zinc-finger interactions support the 
findings reported here from an unbiased genome scale 
study. 

TF PG4 motif associations as putative regulators of cell 
cycle-related genes 

Interestingly, all the nine TFs with target sites significantly 
co-occurring with conserved PG4 motifs on the promoters 
of human, chimpanzee, mouse and rat, have been 
implicated in progression through cell cycle. For 
example, Tapias et al. have shown that the TF SP1 regu- 
lates cell-cycle progression through CDK4 and CDKNlAj 
p21 interaction (36). In addition, we noted several in- 
stances where a cell-cycle gene could be potentially 
regulated by presence of PG4-TFBS combinations in 
promoters (see Supplementary Data for details). Based 
on these it is tempting to speculate that PG4 motifs in 
association with TF may influence cell cycle related 
cellular function. Though, this appears to be in line with 
observations made in a recent study [vide infra (37)], 
further work will be required to directly test this 
possibility. 

G-quadruplex ligand interactions affect non-telomeric 
functions 

Reduced cell proliferation, particularly in tumors, has 
been reported using various G-quadruplex binding 
ligands like the cationic porphyrin TMPyP4 [tetra(N- 
methyl-4-pyridyl)-porphyrin chloride], papaverine- 
derived ligands 6a,12a-diazadibenzo-[a,g]fluorenylium 
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Figure 5. Genes harboring conserved PG4-TFBS associations are differentially expressed in presence of G-quadruplex binding ligand. Left panel: 
expression profile of genes with conserved PG4 motif that have significant differential expression in both cell lines on treatment with ligand. 
Pseudocolor representing their relative expression values in HeLaS3 and A549 cells. Right panel: association of PG4 motif with TFBS in promoters 
of differentially expressed genes shown in left panel; black box represents PG4 motif and associated pseudocolor shows number of TFBS within 
100-base windows relative to TSS. 
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and 2,3,9,1 0-tetramethoxy- 1 2-oxo- 1 2H-indolo[2, 1 -a]- 
isoquinolinium chloride, BRACO-19 (3,6,9-trisubstuted 
acridine ligand), RHSP4 [3,ll-dinuoro-6,8,13-trimethyl- 
8H-quino(4,3,2-kl)acridinium methosulphate and 
telomestatin (38,39)]. A study by Grand et al. (40) 
showed reduced tumor growth due to decreased expres- 
sion of c-MYC in presence of TMPyP4 which reduces 
hTERT and various others genes that together regulate 
telomere length and thereby enhance proliferative 
capacity of the cell. Based on our current results it is 
possible that the effect on cell proliferation/cell-cycle regu- 
lation observed in presence of the above ligands may be 
due to interaction with G-quadruplex motifs present in 
promoters, which thereby alter regulatory mechanisms 
involving TF, in addition to inhibition of telomerase or 
telomeric DNA amplification (41) by binding to telomeric 
G-quadruplex motifs. A recent study using a synthetic 
analog of telomestatin, HXDV (a hexaoxzole macrocycle) 
showed anti-proliferative activity and inhibition of 
cell-cycle progression leading to M-phase cell-cycle arrest 
due to specific G-quadruplex binding affinity of HXDV 
inside cells (37). Interestingly, the M-phase cell-cycle arrest 
was found to be independent of the telomerase status of 
cells (found also in telomerase-negative cells). This is con- 
sistent with our findings suggesting disruption of promoter 
G-quadruplexes and associated TF interactions lead to 
arrest in cell-cycle progression. 

G-quadruplex DNA and zinc-finger proteins as 
binding pairs 

Versatility of the zinc-finger binding pocket has been 
widely studied. The modular nature of the pocket and 
the variety of DNA (and RNA) elements, within a given 
generic code, that zinc-finger factors recognize is 
intriguing (42,43). Interestingly, this has led to the discov- 
ery of tailor-made nucleases that use the specificity of a 
given DNA sequence and the best-fit zinc-finger binding 
domain (44). On the other hand, G-quadruplex research 
increasingly points out the possibility of a vast number 
of structural motifs. Moreover, emerging reports suggest 
biological roles where variety within the G-quadruplex 
structural domain is evident (38). In this context, it is 
interesting to consider the implications of 
G-quadruplex-zinc-finger interactions as a pair, where 
the respective modular variations possible in both DNA 
and protein domains can be exploited. Keeping these in 
mind, it is tempting to speculate that cis and trans aspects 
of the G-quadruplex and zinc-finger interactions, respect- 
ively, have co-evolved to provide answers regarding how 
domain variations in DNA and the cognate protein 
binding domain are best utilized. Secondly, the perplexing 
number of G-quadruplexes in the genome is intriguing, 
raising considerable doubt regarding how many would 
be functional. Contextual use of G-quadruplex motifs is 
a plausible answer, presumably through ?ra«^-associations 
with protein factors that contextually extrude or bind 
G-quadruplexes in cell-type/state-specific fashion. 
Zinc-finger factors appear to be well-suited for this 
purpose, both, due to their contextual presence as well 
as domain-variations within a general theme, emphasizing 



the implications of the PG4 motif-zinc-finger associations 
found in this study. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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